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Maximum entropy (maxEnt) inference of state probabilities using state-dependent constraints 
is popular in the study of complex systems. In stochastic dynamical systems, the effect of state 
space topology and path-dependent constraints on the inferred state probabilities is unknown. To 
that end, we derive the transition probabilities and the stationary distribution of a maximum path 
entropy Markov process subject to state- and path-dependent constraints. The stationary distribu¬ 
tion reflects a competition between path multiplicity and imposed constraints and is significantly 
different from the Boltzmann distribution. We illustrate our results with a particle diffusing on an 
energy landscape. Connections with the path integral approach to diffusion are discussed. 


Owing to our increasing ability to collect large amounts 
of data in complex systems and our inability to construct 
generative models to explain that data, descriptive ap¬ 
proaches have become popular. One such framework is 
the principle of maximum entropy (maxEnt) ([IHl]). Intu¬ 
itively, maxEnt picks the ‘least informative’ distribution 
over states while requiring it to reproduce certain aspects 
of the data. The result is the Boltzmann distribution in 
constrained quantities. maxEnt has been employed to 
study a variety of problems, for example, neuronal firing 
patterns bird flocks (Uni), ecological species distri¬ 
bution Q , gene expression noise sequence variability 
in proteins (Hnuni), and behavior m- 

In many cases (TO, but not always (HDHni), the ex¬ 
perimental data is a realization of a stochastic process. 
In such cases, one may wish impose path-dependent cur¬ 
rent like constraints in addition to state-dependent con¬ 
straints. Moreover, the dynamical radius of any state — 
the states reachable in a single transition — is usually 
finite, which defines the state space topology. How these 
factors affect inferred state probabilities is unknown. 

We solve this problem for Markovian dynamics in dis¬ 
crete state and time. In order to incorporate dynami¬ 
cal information, we maximize a path entropy. We derive 
transition probabilities and the stationary distribution 
of the maximum path entropy Markov process subject 
to state- and path-dependent constraints. The station¬ 
ary distribution is the product of the left and the right 
Perron-Frobenius eigenvectors of a matrix and depends 
non-trivially on the topology and imposed constraints. 
We illustrate our results with a random walk diffusing 
on a two dimensional energy landscape. 

We begin with an observation. Discrete state stochas¬ 
tic systems can be modeled by a random walk in higher 
dimensions. For example, the time evolution of an Ising 
model with N spins is a random walk in 2^ dimensions. 
If at most one spin flip per transition is allowed, for ex¬ 
ample the popular Glauber dynamics m, every state 
is connected to only N out of the 2^ states. To that 
end, we consider an irreducible and aperiodic discrete 
time Markovian random walk on a directed graph G with 
nodes V and edges E. We denote the unique stationary 


distribution over the states by {pa}- We assume that 
transition probabilities kab ^ 0 only when (a, b) G E. 

We seek the maximum entropy stationary distribution 
subject to state- and path-dependent constraints. The 
appropriate ensemble to impose these constraints is the 
ensemble {T} of stationary state trajectories 
a —>■ 6 —>■ • • • of hxed but unspecihed duration T. We only 
consider trajectories that are permissible by the state 
space topology. The entropy of the ensemble, normalized 
by r, is given by dUHni) 

S = -;^l0gP(r)l0gP(r) = -'^Pakab^Ogkab (1) 

a,b 

In Eq. and from here onwards, unless speciefied oth¬ 
erwise, all summations involving quantities with two in¬ 
dices are restricted on the edges of the graph. 

{pa\ and {kab} are not independent of each other. In 
fact, they are constrained as follows 

^ '^, Pakab —Pa, ^ ', Pakab — Pb-j ^ ', Pakah — 1- (2) 

b a a,b 

If the dynamics is reversible, the walk also satisfies de¬ 
tailed balance conditions, 

Pakab=Pbkba- (3) 

Let us introduce constraints of path ensemble aver¬ 
ages of state- and path-dependent quantities r*f,. State- 
dependent quantities such as energy and particle 
number depend only on the initial state a or the final 
state b. Path-dependent quantities such as energy 
or particle currents depend on both states. The path 
ensemble averages are given by dUHUD 

{r") =^Pakabrlb- (4) 

a,b 

We maximize the path entropy S in Eq. with re¬ 
spect to unknown stationary distribution pa and transi¬ 
tion probabilities kab while imposing constraints in Eqs.[^ 


and Eq. Using Lagrange multipliers, we write the un¬ 
constrained Lagrange function, sometimes called the Cal¬ 
iber dlllll), 
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C = S + y^/ma 

a 





b 



+ 






Maximizing the Caliber with respect to Pa and kab, we 
find that the transition probabilities kab are given by (see 
appendix for details) 


kab=^^W. 


( 6 ) 



FIG. 1. Energy landscape on a, N x N square lattice with 
N = 40. Energy is heighest at the center of the lattice and 
decreases as the reciprocal of the squared distance from the 
center (see Eq. 141. We have chosen A = 11 and B = 10. 


where the elements of the constraint matrix W are given 

by 



when (a, b) € E and zero otherwise, (j) is the normalized 
eigenvector of W corresponding to its maximum eigen¬ 
value rj. The Perron-Frobenius theorem guarantees that 
(j) is strictly positive and rj is unique and positive. A sim¬ 
ple case of Eq. for a freely diffusing random walk was 
studied by Burda et al. (dH) where W is equal to the 
adjacency matrix of the graph G. 

The stationary distribution {pa} can be determined by 
solving the linear system of equations 

'^Pakab = Pb ( 8 ) 

Thus, if is the left Perron-Frobenius eigenvector and (j) 
is the right Perron-Frobenius eigenvector of W with the 
same eigenvalue ry, the stationary distribution is given by 
the product 


Pa = (9) 

The Perron-Frobenius eigenvectors and thus the station¬ 
ary distribution depend on the topology and the imposed 
constraints in a non-trivial fashion. In other words, the 
Boltzmann distribution, obtained by maximizing the en¬ 
tropy over state-distributions, is no longer guanranteed 
when dynamical information is introduced. 

Is the inferred Markov process reversible? Let us cal¬ 
culate its entropy production rate s (EOl), 

s = J2P‘^^<^blog^ = ( 10 ) 

, ^ba 

a^b i 

In Eq. |10[ only the antisymmetric part of constraints 
r* J contributes to entropy production. If all constraints 
are symmetric, the entropy production is zero and the 


Markov process is reversible. In fact, if microscopic re¬ 
versibility (Eq. 1^ is explicitly imposed, the inference 
problem is equivalent to constraining symmetrized quan¬ 
tities <1 = 5 {<b + rla) (see appendix for details). In 
this case, the constraint matrix W is symmetric and the 
left and the right Perron-Frobenius eigenvectors coincide. 
The stationary distribution is simply the square of this 
eigenvector. 

Finally, we write down the probability of an arbitrary 
path P = oi —>■ 02 —>■ 03 —>■•••—>■ a„ of total duration n. 
If the initial state oi is chosen from a distribution po{ai), 
we have 


p(r) — po(^l) * kaia2 ' ka2a3 ‘ ‘ ’ 


PojO'i) 1 

(l>ai 


( 11 ) 

( 12 ) 


where A(r) is the ‘action’ associated with the path P and 
is given by 


n—1 

= ( 13 ) 

i t—1 


Our construction of the maximum path entropy 
Markov process and its stationary distribution is com¬ 
plete. While it gives us a recipe to calculate the sta¬ 
tionary distribution, Eq. does not allow us an intu¬ 
itive understanding of how it depends on topology and 
constraints. Below, we will illustrate three important 
features that are uniqe to path entropy maximization, 
path entropy/enthalpy compensation, state space topol¬ 
ogy, and currents. 

In an illustrative example, we consider a particle dif¬ 
fusing on a TV X IV square lattice. In a single transition, 
the particle jumps to one its nearest neighbors. We define 
the energy at every point a = {x, y) as 


A 

x'^ + + B 


( 14 ) 
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FIG. 2. Stationary probabilities Pa in a finite square lattice 
when average energy constraints are imposed. The particle 
localizes in the center of the lattice in the absence of con¬ 
straints (7 = 0 , left panel). When average energy constraints 
are used, the particle finds a balance between multiplicity of 
paths and energetics of the states (center and right panels). 



A and B are positive constants. Below, we fix A = 11 
and B = IQ. The energy function is symmetric in x and 
y, has a peak in the middle of the lattice, and takes its 
lowest values in the four corners (see Fig. [^. 

First, let us assume that the square lattice is aperiodic. 
Corner points, edges, and interior points have 2, 3, and 
4 nearest neighbors respectively. Let us obtain the sta¬ 
tionary distribution with constraints of average energy 
and detailed balance. We first construct the symmetric 
constraint matrix 


Wab = exp 


-7 


Cb 


(15) 


when a and b are nearest neighbors on the lattice and 
zero otherwise. 7 is the Lagrange multiplier associated 
with the average energy constraints. We then find (j), 
its right Perron-Frobenius eigenvector. The stationary 
distribution is Pa oc cj)^. 

In Fig. [^we show the stationary distribution for 7 = 
0, 0.005, and 0.05. 7 = 0 is denotes absence of energy 
constraint. In this case, the particle localizes near the 
center of the lattice, a striking departure from the mi- 
crocanonical maxEnt distribution which predicts equal 
probabilities for all states. The entropic localization re¬ 
sults from the higher multiplicity of paths in the central 
region compared to the boundaries dm. When aver¬ 
age energy constraints are imposed (7 > 0 ), the particle 
balances the entropic multiplicity of paths with energetic 
unfavorability of states. This balance is remniscent of en¬ 
tropy/enthalpy compensation (| 21 ll well known in chem¬ 
istry. At 7 = 0.05, the particle spontaneously localizes 
in one of the four corners. Instead of choosing low en¬ 
ergy regions near the vertical and horizontal boundaries, 
the particle chooses regions near the diagonals because 
of their higher path multiplicity. 

Thus, asymmetry in state space topology has a huge 
impact on the stationary distribution. Are state-based 
maxEnt and maximum path entropy distributions equal 
when all states are topologically equivalent? We give the 
answer in the negative. Consider a periodic N x N square 
lattice. The only topological restriction is that in a single 


FIG. 3. The maximum path entropy stationary distribution 
Pa and the Boltzmann distribution qa when average energy 
constraints are imposed. The particle is allowed to jump to 
the nearest neighbor (A, top) and up to the third nearest 
neighbor (B, bottom). 


time step, the particle is allowed to jump to only a finite 
number of states. 

In Fig. we plot the stationary distribution (Eq.|^, 
the Boltzmann distribution Qa oc and their ra¬ 

tio after constraining the mean energy, pa is calculated 
as above with a slight modification that the underlying 
graph of connectivity represents a periodic lattice. 7 (see 
Eq. 151 is fixed at 0.025. Inverse temperature (3 is ad¬ 


justed to match the numerical value of the mean energy, 
which allows a direct comparison. We study two differ¬ 
ent state space topologies. On the top (A), we allow the 
particle to jump to any one of its nearest neighbors in a 
single transition. On the bottom (B), we allow the par¬ 
ticle to jump up to three Hamming distance away. In 
both cases, Pa is significantly different than especially 
in the region of high energy. How do we understand this 
difference? On the one hand, the maxEnt distribution 
Qa depends solely on the state energy e^. On the other 
hand, Eq. shows that the paths that visit states of 
both high and low energy have a non-negligible proba¬ 
bility thereby increasing the stationary probability Pa of 
high energy states compared to Pa. As the dynamical 
reach of the particle is increased from first nearest neigh¬ 
bor to third nearest neighbor, the difference between the 
maxEnt distribution and the maximum path entropy dis¬ 
tribution gets smaller; mean of the absolute log ratio of 
the probabilities decreases from ~ 0.75 to ~ 0.5 (0 for 
identical distributions). Indeed, if the particle can jump 
from any state to any other state in a single transition, 
the maxEnt and the maximum path entropy predictions 
are trivially identical to each other (USD. 

In addition to state-dependent quantities like energy, 
one may wish to constrain path-dependent quantities, 
like currents. How do path-dependent constraints change 
the stationary distribution? Let us consider the periodic 
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FIG. 4. The change in the maximum path entropy stationary distribution in the presence of non-equilibrium current. Net 
currents across the boundaries of a system will allow regions of high energy to be frequently visited and vice versa for regions 
of low energy. As a increases (from left to right), the stationary probability of states near Y — 0 and X = ±20 increases and 
the probability of states near Y = ±20 and X = 0 decreases. 


N X N square lattice as above. We constrain the average 
energy and a current along the positive Y axis (see Fig.[^ 
and Fig. |^. To obtain the stationary distribution, we 
first identify the asymmetric constraint matrix 


Wab = exp 



Co ± fife \ 

2 J 



(16) 


As above, 7 is the Lagrange multiplier associated with 
energy and a is associated with current. The current in 
the positive Y direction between states a = {x, y) and 
b = {z,w) is defined as Jab = ±1 if w = ?/ ± 1 with 
appropriate corrections at y,w = 1,N. Jab is zero for 
sideways movement. Note that Jab is antisymmetric and 
contributes to entropy production. We find the left and 
the right Perron-Frobenius eigenvectors "0 and </> of W. 
The stationary distribution is the product of these two 
vectors, Pa = V’a'/'a- 

Fig. ID shows the stationary distribution at a = 0 , 0 . 1 , 
and 0.5 and 7 held fixed at 7 = 0.025. At a = 0, there 
are no net currents and the stationary distribution is 
governed entirely by the energy constraints. When we 
increase a to 0.1 (center) and 0.5 (right), we see that net 
currents modulate the stationary distribution, a fact well 
known in statistical physics (jUj). This effect can be un¬ 


derstood by looking at path probabilities. From Eq. 12 


we know that paths that traverse through high energy 
regions have a low probability. But, this may be allevi¬ 
ated if they simultaneously carry a net favorable current. 
This leads to a higher probability for energetically unfa¬ 
vorable states that are represented frequently in current 
carrying paths. 

In summary, Fig[^ Fig. and Fig. |^show that asym¬ 
metry in state space topology, finite dynamical reach of 
states, and path-dependent constraints all can alter the 
inferred stationary distribution in a non-trivial fashion. 
These effects will likely be magnified in higher dimensions 
and are relevant in many discrete state systems where 
state-based maxEnt has previously been employed ( 0 - 
0 ). It will be interesting to see whether these additional 
features lead to better predictive models. 


We discussed how dynamical information affects the 
estimate of the inferred state probabilities. But, we also 
have access to the path probabilities (see Eq. . What 
is the relevance of the inferred Markovian dynamics to 
the study of diffusive random walks in general? We pro¬ 
vide a speculation. The two mathematical frameworks 
to describe random walks, the local Fokker-Planck for¬ 
mulation and the non-local path-integral formulation are 
often equivalent. For example, the local assertion that 
all nearest neighbor jumps on an infinite regular lattice 
are equiprobable is equivalent to the non-local assertion 
that all paths of equal duration are equiprobable. But, 
confinement and lattice irregularities lead to prominent 
localization away from the boundary; a striking differ¬ 
ence between the two approaches dHD. This localization 
is usually explained by invoking hctitious entropic forces 
in the Fokker-Planck approach. We believe that path 
based approaches may be better descriptors of stochastic 
dynamics especially for discrete and finite systems such 
as spin systems and chemical reaction networks. We leave 
this for future theoretical and experimental studies. 
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DERIVATION OF THE MARKOV CHAIN 


IMPOSING DETAILED BALANCE 


For notational simplicity, we consider the Caliber only 
with one constraint Tab- Generalization to multiple con¬ 
straints is straightforward 


C = -'^Pakab^Ogkab + X! '^Pakab - Pa 

a,b Q- \ b 

+ '^Pakab -Pbj +S (^Pakab 

b \ a 

- 7 ^PakabTab “ ( 


- 1 


(17) 


As above, all summations involving two indices are re¬ 
stricted to edges of the graph. 

Differentiating the Caliber with respect to kab, we have 

Pai^Ogkab + 1) = Pa (Wa + Ub + S - 'yVab) 

^ kab = (18) 


Differentiating the Caliber with respect to Pa, we have 


0 = - ^ /Cafo log kab + ^ - ma -f ^ n^kab 

b b b 

- na 

S^kab- kab^ab 

b b 

(19) 

Substituting kab from Eq. [T^ we get 


ma + na = 1 

( 20 ) 

Substituting in Eq. we get 


kab = ——Wab 

V9a 

( 21 ) 

Here, Wab = e when (a, b) G E and zero otherwise, 

^a = and 77 = e~^. Imposing kab = !> we have 

^ = p(j)a 

b 

( 22 ) 

Given that W is irreducible and non-negative, it has 
a Perron-Erobenius eigenvalue that is positive and such 


that the corresponding eigenvector has positive elements. 
Given that the solution to the Caliber maximization 
problem is unique, if we choose (j) to be the Perron- 
Frobenius vector, we obtain the transition matrix ele¬ 
ments kab as 


kab — I ab 
V9a 


when (a, b) G E and zero otherwise. 


(23) 


As above, we consider the Caliber 
C = - '^Pakab log kab + '^rUa ^ Pakab “ Pa 

a,b a \ b / 

(i: Pakab - l) 

+ X! kPakab -Pbkba)-1 Pakab^ab -(’’)]• 

a,b \ ab / 

( 24 ) 

We have introduced Lagrange multipliers tab to enforce 
detailed balance. As above, all summations involving two 
indices are restricted to edges of the graph. 

Differentiating the Caliber with respect to kab, we have 

Pa (log fcab + 1 ) = PalTT-a + Panb + PaS + Pai^ab “ ^ba) 

- PalTab ( 25 ) 

^ kab = ( 26 ) 

Differentiating the Caliber with respect to Pa, we have 
0 = - ^ fcab log kab + rUa'^kab - ma + Ubkab “ 


kabi^ab - Eba) - l^^Jabkab 


(27) 


Substituting kab from Eq. 26 we get 
ma + na = l 


Substituting in Eq. 26 we get 

ttb _ 


kab — 6 

ma 


ITah 


^ab' 


(28) 


(29) 


Here, Ua = e 77 = e and Kab = Notice 

that KabKba = 1- 

To determine Kab, we impose detailed balance. 


^-■yrab+^rha ..2 


kba Pa O^a 
^ Kab = 

V Pa ^b 

Thus, the transition probabilities are 


^ab 


kab — 


/^ _^g-7r-a 


Pa ^b 

= 1 f^-kljrab+rho.) 

Vy Pa 


(30) 

(31) 

(32) 

(33) 


Let 4>a = y/rE. and Wab = e I'yi^^^b+rba) -v^hen (a, b) € 
E and zero otherwise. Using J^b kab = 1, we have 


Y.^ab<t>b = V(j^a (34) 

b 
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Thus, (j), the vector of square roots of probabilities is ing detailed balance is equivalent to constraining a sym- 
the eigenvector of W with eigenvalue rj. Thus, impos- metrized form of the constraints. 



